Search CORE

11 research outputs found

Towards Scalable Synchronization on Multi-Cores

Author: Trigonakis Vasileios
Publication venue: Lausanne, EPFL
Publication date: 19/10/2016
Field of study

The shift of commodity hardware from single- to multi-core processors in the early 2000s compelled software developers to take advantage of the available parallelism of multi-cores. Unfortunately, only few---so-called embarrassingly parallel---applications can leverage this available parallelism in a straightforward manner. The remaining---non-embarrassingly parallel---applications require that their processes coordinate their possibly interleaved executions to ensure overall correctness---they require synchronization. Synchronization is achieved by constraining or even prohibiting parallel execution. Thus, per Amdahl's law, synchronization limits software scalability. In this dissertation, we explore how to minimize the effects of synchronization on software scalability. We show that scalability of synchronization is mainly a property of the underlying hardware. This means that synchronization directly hampers the cross-platform performance portability of concurrent software. Nevertheless, we can achieve portability without sacrificing performance, by creating design patterns and abstractions, which implicitly leverage hardware details without exposing them to software developers. We first perform an exhaustive analysis of the performance behavior of synchronization on several modern platforms. This analysis clearly shows that the performance and scalability of synchronization are highly dependent on the characteristics of the underlying platform. We then focus on lock-based synchronization and analyze the energy/performance trade-offs of various waiting techniques. We show that the performance and the energy efficiency of locks go hand in hand on modern x86 multi-cores. This correlation is again due to the characteristics of the hardware that does not provide practical tools for reducing the power consumption of locks without sacrificing throughput. We then propose two approaches for developing portable and scalable concurrent software, hence hiding the limitations that the underlying multi-cores impose. First, we introduce OPTIK, a new practical design pattern for designing and implementing fast and scalable concurrent data structures. We illustrate the power of our OPTIK pattern by devising five new algorithms and by optimizing four state-of-the-art algorithms for linked lists, skip lists, hash tables, and queues. Second, we introduce MCTOP, a multi-core topology abstraction which includes low-level information, such as memory bandwidths. MCTOP enables developers to accurately and portably define high-level optimization policies. We illustrate several such policies through four examples, including automated backoff schemes for locks, and illustrate the performance and portability of these policies on five platforms

Infoscience - École polytechnique fédérale de Lausanne

Abstracting Multi-Core Topologies with MCTOP

Author: Chatzopoulos Georgios
Guerraoui Rachid
Harris Tim
Trigonakis Vasileios
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/04/2017
Field of study

Portability and efficiency are usually antagonists in multi-core computing. In order to develop efficient code, one needs to take into account the topology of the target multi-cores (e.g., for locality). This clearly hampers code portability. In this paper, we show that you can have the cake and eat it too. We introduce MCTOP, an abstraction of multi-core topologies augmented with important low-level hardware information, such as memory bandwidths and communication latencies. We show how to automatically generate MCTOP using libmctop, our library that leverages the determinism of cache-coherence protocols to infer the topology of multi-cores using only latency measurements. MCTOP enables developers to accurately and portably define high-level performance optimization policies. We illustrate several such policies through four examples: (i-ii) thread placement in OpenMP and in a MapReduce library, (iii) a topology-aware mergesort algorithm, as well as (iv) automatic backoff schemes for locks. We illustrate the portability of these optimizations on five processors from Intel, AMD, and Oracle, with low effort

Infoscience - École polytechnique fédérale de Lausanne

Designing ASCY-compliant Concurrent Search Data Structures

Author: Che Tong
David Tudor Alexandru
Guerraoui Rachid
Trigonakis Vasileios
Publication venue: Lausanne
Publication date: 18/12/2014
Field of study

This report details the design of two new concurrent data structures, a hash table, called CLHT, and a binary search tree (BST), called BST-TK. Both designs are based on asynchronized concurrency (ASCY), a paradigm consisting of four complementary programming patterns. ASCY calls for the design of concurrent search data structures to resemble that of their sequential counterparts. CLHT (cache-line hash table) uses cache-line-sized buckets and performs in-place updates. As a cache-line block is the granularity of the cache-coherence protocols, CLHT ensures that most operations are completed with at most one cache-line transfer. BST-TK reduces the number of cache-line transfers by acquiring less locks than existing BSTs

Infoscience - École polytechnique fédérale de Lausanne

Unlocking Energy

Author: Falsafi Babak
Guerraoui Rachid
Picorel Obando Javier
Trigonakis Vasileios
Publication venue: Berkeley, Usenix Assoc
Publication date: 06/09/2016
Field of study

Locks are a natural place for improving the energy efficiency of software systems. First, concurrent systems are mainstream and when their threads synchronize, they typically do it with locks. Second, locks are well-defined abstractions, hence changing the algorithm implementing them can be achieved without modifying the system. Third, some locking strategies consume more power than others, thus the strategy choice can have a real effect. Last but not least, as we show in this paper, improving the energy efficiency of locks goes hand in hand with improving their throughput. It is a win-win situation. We make our case for this throughput/energy-efficiency correlation through a series of observations obtained from an exhaustive analysis of the energy efficiency of locks on two modern processors and six software systems: Memcached, MySQL, SQLite, RocksDB, HamsterDB, and Kyoto Kabinet. We propose simple lock-based techniques for improving the energy efficiency of these systems by 33% on average, driven by higher throughput, and without modifying the systems

Infoscience - École polytechnique fédérale de Lausanne

FloDB: Unlocking Memory in Persistent Key-Value Stores

Author: Balmau Oana Maria
Guerraoui Rachid
Trigonakis Vasileios
Zablotchi Mihail Igor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/07/2018
Field of study

Log-structured merge (LSM) data stores enable to store and process large volumes of data while maintaining good performance. They mitigate the I/O bottleneck by absorbing updates in a memory layer and transferring them to the disk layer in sequential batches. Yet, the LSM architecture fundamentally requires elements to be in sorted order. As the amount of data in memory grows, maintaining this sorted order becomes increasingly costly. Contrary to intuition, existing LSM systems could actually lose throughput with larger memory components. In this paper, we introduce FloDB, an LSM memory component architecture which allows throughput to scale on modern multicore machines with ample memory sizes. The main idea underlying FloDB is essentially to bootstrap the traditional LSM architecture by adding a small in-memory buffer layer on top of the memory component. This buffer offers low-latency operations, masking the write latency of the sorted memory component. Integrating this buffer in the classic LSM memory component to obtain FloDB is not trivial and requires revisiting the algorithms of the user-facing LSM operations (search, update, scan). FloDB's two layers can be implemented with state-of-the-art, highly-concurrent data structures. This way, as we show in the paper, FloDB eliminates significant synchronization bottlenecks in classic LSM designs, while offering a rich LSM API. We implement FloDB as an extension of LevelDB, Google's popular LSM key-value store. We compare FloDB's performance to that of state-of-the-art LSMs. In short, FloDB's performance is up to one order of magnitude higher than that of the next best-performing competitor in a wide range of multi-threaded workloads

Infoscience - École polytechnique fédérale de Lausanne

CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure

Author: Chafi Hassan
Chiadmi Dalila
Firmli Soukaina
Hong Sungpack
Lozi Jean-Pierre
Psaroudakis Iraklis
Trigonakis Vasileios
Weld Alexander
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Principles of Distributed Systems (OPODIS 2020)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Case Report: Endoscopic radiofrequency ablation with radial-EBUS and ROSE

Author: Dimitrios Matthaios
Eleni-Isidora Perdikouri
Konstantinos Porpodis
Kostas Trigonakis
Nikolaos Courcoutsakis
Paul Zarogoulidis
Paul Zarogoulidis
Vasileios Papadopoulos
Wolfgang Hohenforst-Schmidt
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2023
Field of study

BackgroundSingle pulmonary nodules are a common issue in everyday clinical practice. Currently, there are navigation systems with radial-endobronchial ultrasound and electromagnetic navigation for obtaining biopsies. Moreover, rapid on-site evaluation can be used for a quick assessment. These small lesions, even when they do not have any clinically significant information with positron emission tomography, are important to investigate.Case descriptionRadiofrequency and microwave ablation have been evaluated as local treatment techniques. These techniques can be used as therapy for a patient population that cannot be operated on. Currently, one verified operating system is used for endoscopic radiofrequency ablation through the working channel of a bronchoscope.ConclusionIn our case, a new system was used to perform radiofrequency ablation with long-term follow-up

Directory of Open Access Journals

Design of a Distributed Transactional Memory for Many-core systems

Author: Trigonakis Vasileios
Publication venue: KTH, Skolan för informations- och kommunikationsteknik (ICT)
Publication date: 01/01/2011
Field of study

The emergence of Multi/Many-core systems signified an increasing need for parallel programming. Transactional Memory (TM) is a promising programming paradigm for creating concurrent applications. At current date, the design of Distributed TM (DTM) tailored for non coherent Manycore architectures is largely unexplored. This thesis addresses this topic by analysing, designing, and implementing a DTM system suitable for low latency message passing platforms. The resulting system, named SC-TM, the Single-Chip Cloud TM, is a fully decentralized and scalable DTM, implemented on Intel’s SCC processor; a 48-core ’concept vehicle’ created by Intel Labs as a platform for Many-core software research. SC-TM is one of the first fully decentralized DTMs that guarantees starvation-freedom and the first to use an actual pluggable Contention Manager (CM) to ensure liveness. Finally, this thesis introduces three completely decentralized CMs; Offset-Greedy, a decentralized version of Greedy, Wholly, which relies on the number of completed transactions, and FairCM, that makes use off the effective transactional time. The evaluation showed the latter outperformed the three

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

TM 2 C: a software transactional memory for many-cores

Author: Gramoli Vincent
Guerraoui Rachid
Trigonakis Vasileios
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/11/2018
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Lock–Unlock: Is That All? A Pragmatic Analysis of Locking in Software Systems

Author: Guerraoui Rachid
Guiroux Hugo
Lachaize Renaud
Quéma Vivien
Trigonakis Vasileios
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/03/2019
Field of study

International audienc

Hal - Université Grenoble Alpes